The linear regression model for hour 4 includes several parameters
such as lag_1_production, dlwrf_surface,
tmp_surface, hourly_cloud_average,
special_period, and trend_hour_4. This model
aims to capture the production pattern at 4 AM, which generally has
limited data due to minimal solar activity at this time.
#Production data for hour 4
hour_4_data <- all_data[all_data$hour == 4, ]
# Plot production for hour 4
ggplot(hour_4_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 4",
x = "Date",
y = "Production") +
theme_minimal()
# Filter data for hour 4
hour_4_data <- all_data[all_data$hour == 4, ]
hour_4_data$trend_hour_4 <- 1:nrow(hour_4_data)
hour_4_data[, lag_1_production := shift(production,1)]
hour_4_data[,lag_1_diff:=production-lag_1_production]
hour_4_data <- hour_4_data[!is.na(lag_1_production)]
# Fit linear regression model for hour 4
lm_hour_4 <- lm(production ~+lag_1_production +DLWRF_surface+TMP_surface+hourly_cloud_average+special_period+trend_hour_4+month, data = hour_4_data)
# Summarize the model
summary(lm_hour_4)
##
## Call:
## lm(formula = production ~ +lag_1_production + DLWRF_surface +
## TMP_surface + hourly_cloud_average + special_period + trend_hour_4 +
## month, data = hour_4_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.119491 -0.001286 -0.000220 0.000614 0.173167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.774e-02 5.524e-02 0.502 0.61569
## lag_1_production 6.867e-01 2.490e-02 27.577 < 2e-16 ***
## DLWRF_surface 2.489e-05 3.890e-05 0.640 0.52241
## TMP_surface -1.206e-04 2.290e-04 -0.527 0.59851
## hourly_cloud_average -3.182e-05 2.898e-05 -1.098 0.27252
## special_period -6.329e-03 1.534e-03 -4.125 4.09e-05 ***
## trend_hour_4 6.630e-07 2.222e-06 0.298 0.76548
## monthAug 1.974e-03 2.669e-03 0.740 0.45964
## monthDec -3.856e-04 2.204e-03 -0.175 0.86116
## monthFeb -2.568e-04 2.290e-03 -0.112 0.91074
## monthJan -1.861e-04 2.154e-03 -0.086 0.93119
## monthJul 7.587e-03 2.557e-03 2.968 0.00309 **
## monthJun 1.589e-02 2.620e-03 6.065 2.01e-09 ***
## monthMar -1.008e-04 2.027e-03 -0.050 0.96035
## monthMay -2.629e-04 1.990e-03 -0.132 0.89493
## monthNov 1.310e-03 2.153e-03 0.608 0.54324
## monthOct 2.622e-03 2.220e-03 1.181 0.23796
## monthSep 2.288e-03 2.389e-03 0.958 0.33855
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0117 on 827 degrees of freedom
## Multiple R-squared: 0.6841, Adjusted R-squared: 0.6776
## F-statistic: 105.4 on 17 and 827 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_4)
##
## Breusch-Godfrey test for serial correlation of order up to 21
##
## data: Residuals
## LM test = 422.17, df = 21, p-value < 2.2e-16
Coefficients and Significance:
Lag 1 Production: Highly significant with a positive impact, indicating that the production from the previous hour heavily influences the current hour’s production.
DLWRF Surface, TMP Surface, Hourly Cloud Average: These parameters show lower significance, indicating minimal impact on the prediction for this hour.
Special Period: Significant and negatively impacting, indicating that during the special period, the production at hour 4 is lower.
Monthly Effects: Certain months like June and July have significant coefficients, indicating seasonal variations in production.
Residuals and Diagnostics:
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6806, suggesting that the model explains about 68.06% of the variability in production.
Adjusted R-squared: 0.6742, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
he WMAPE for hour 4 was found to be 1407.11%, indicating a inaccuracy model.
Hourly Production Data for Hour 4:
Residuals Analysis:
Top Plot (Residuals over time): Indicates potential periods of higher residuals, suggesting times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Shows autocorrelation in the residuals, which might suggest that some patterns in the data are not fully captured by the model.
Histogram (Distribution of Residuals): Indicates that residuals are centered around zero but have some deviation, highlighting the areas where predictions might be off.
For hour 5, a similar approach to the one used for hour 4 was
applied. The linear regression model included parameters like
lag_1_production, dlwrf_surface,
is.ramadan, special_period,
trend_hour_5, and interactions between month
and hourly_max_t.
#Production data for hour 5
hour_5_data <- all_data[all_data$hour == 5, ]
# Plot production for hour 4
ggplot(hour_5_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 5",
x = "Date",
y = "Production") +
theme_minimal()
# Filter data for hour 5
hour_5_data <- all_data[all_data$hour == 5, ]
hour_5_data <- hour_5_data[,-c(2)]
hour_5_data[, lag_1_production := shift(production,1)]
hour_5_data[,lag_1_diff:=production-lag_1_production]
hour_5_data <- hour_5_data[!is.na(lag_1_production)]
hour_5_data$trend_hour_5 <- 1:nrow(hour_5_data)
# Fit linear regression model for hour 5
lm_hour_5 <- lm(production ~+lag_1_production+DLWRF_surface+is.ramadan+special_period+trend_hour_5 +month*hourly_max_t, data = hour_5_data)
# Summarize the model
summary(lm_hour_5)
##
## Call:
## lm(formula = production ~ +lag_1_production + DLWRF_surface +
## is.ramadan + special_period + trend_hour_5 + month * hourly_max_t,
## data = hour_5_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81132 -0.02527 -0.00681 0.01221 1.77736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.486e+00 1.357e+00 -1.095 0.2739
## lag_1_production 4.148e-01 3.170e-02 13.084 < 2e-16 ***
## DLWRF_surface 1.603e-04 2.449e-04 0.655 0.5129
## is.ramadan -2.878e-02 2.062e-02 -1.396 0.1632
## special_period -1.252e-01 1.648e-02 -7.596 8.35e-14 ***
## trend_hour_5 5.244e-05 2.256e-05 2.324 0.0204 *
## monthAug -8.013e-01 2.875e+00 -0.279 0.7805
## monthDec 2.008e+00 1.932e+00 1.039 0.2989
## monthFeb 1.969e+00 1.454e+00 1.354 0.1761
## monthJan 2.122e+00 1.469e+00 1.444 0.1491
## monthJul -3.006e+00 2.497e+00 -1.204 0.2291
## monthJun -5.780e+00 3.416e+00 -1.692 0.0910 .
## monthMar 1.831e+00 1.441e+00 1.271 0.2042
## monthMay 1.819e+00 2.071e+00 0.878 0.3800
## monthNov 2.996e+00 1.954e+00 1.534 0.1255
## monthOct 3.060e+00 1.987e+00 1.540 0.1239
## monthSep 1.814e+00 2.040e+00 0.889 0.3741
## hourly_max_t 5.309e-03 4.954e-03 1.072 0.2842
## monthAug:hourly_max_t 2.920e-03 1.009e-02 0.290 0.7723
## monthDec:hourly_max_t -7.475e-03 7.017e-03 -1.065 0.2870
## monthFeb:hourly_max_t -7.337e-03 5.268e-03 -1.393 0.1641
## monthJan:hourly_max_t -7.891e-03 5.322e-03 -1.483 0.1385
## monthJul:hourly_max_t 1.097e-02 8.822e-03 1.244 0.2140
## monthJun:hourly_max_t 2.076e-02 1.207e-02 1.720 0.0858 .
## monthMar:hourly_max_t -6.767e-03 5.201e-03 -1.301 0.1936
## monthMay:hourly_max_t -6.724e-03 7.421e-03 -0.906 0.3651
## monthNov:hourly_max_t -1.091e-02 7.053e-03 -1.547 0.1223
## monthOct:hourly_max_t -1.094e-02 7.112e-03 -1.538 0.1243
## monthSep:hourly_max_t -6.377e-03 7.256e-03 -0.879 0.3798
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1175 on 816 degrees of freedom
## Multiple R-squared: 0.5733, Adjusted R-squared: 0.5587
## F-statistic: 39.16 on 28 and 816 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_5)
##
## Breusch-Godfrey test for serial correlation of order up to 32
##
## data: Residuals
## LM test = 198.33, df = 32, p-value < 2.2e-16
plot(lm_hour_5)
Coefficients and Significance:
Lag 1 Production: Highly significant with a positive impact, indicating that the production from the previous hour continues to influence the current hour’s production significantly.
DLWRF Surface: Shows lower significance, indicating a minimal direct impact on the predictions for this hour.
Is Ramadan: Significant and negatively impacting, suggesting lower production during Ramadan.
Special Period: Highly significant and negatively impacting, indicating lower production during this period.
Trend Hour 5: Significant and positively impacting, suggesting a gradual increase in production over time.
Monthly Effects:
hourly_max_t, highlighting seasonal variations in
production.Residuals and Diagnostics:
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6002, suggesting that the model explains about 60.02% of the variability in production.
Adjusted R-squared: 0.5868, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
he WMAPE for hour 5 was found to be 154.10%, indicating a inaccuracy model.
Hourly Production Data for Hour 5:
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
For hour 6, the linear regression model included parameters such as
lag_1_production, uswrf_top_of_atmosphere,
wday, hourly_cloud_average,
is.ramadan, special_period,
is.religousday, and interactions between month
and hourly_max_t.
hour_6_data <- all_data[all_data$hour == 6, ]
# Plot production for hour 6
ggplot(hour_6_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 6",
x = "Date",
y = "Production") +
theme_minimal()
# Filter data for hour 6
hour_6_data <- all_data[all_data$hour == 6, ]
hour_6_data <- hour_6_data[, -c(2)]
# Convert the data frame to a data.table
setDT(hour_6_data)
# Create a lagged variable for production with a lag of 1 period
hour_6_data[, lag_1_production := shift(production,1)]
hour_6_data[,lag_1_diff:=production-lag_1_production]
hour_6_data <- hour_6_data[!is.na(lag_1_production)]
hour_6_data$trend_hour_6 <- 1:nrow(hour_6_data)
# Fit linear regression model for hour 6
lm_hour_6 <- lm(production~+lag_1_production+USWRF_top_of_atmosphere+wday+ hourly_cloud_average+is.ramadan+special_period+is.religousday+month*hourly_max_t, data = hour_6_data)
#lm_hour_6 <- lm(production ~ +uswrf_top_of_atmosphere + is.ramadan +is.weekend+ is.religousday +is.publicholiday + month*hourly_max_t , data = hour_6_data)
# Summarize the model
summary(lm_hour_6)
##
## Call:
## lm(formula = production ~ +lag_1_production + USWRF_top_of_atmosphere +
## wday + hourly_cloud_average + is.ramadan + special_period +
## is.religousday + month * hourly_max_t, data = hour_6_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5352 -0.2127 -0.0409 0.1283 4.6541
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.329e+01 6.793e+00 -1.956 0.0508 .
## lag_1_production 4.110e-01 3.288e-02 12.501 <2e-16 ***
## USWRF_top_of_atmosphere 5.117e-02 3.858e-02 1.326 0.1851
## wdayMon 6.914e-02 7.718e-02 0.896 0.3707
## wdaySat 1.176e-01 7.711e-02 1.526 0.1275
## wdaySun 7.834e-04 7.685e-02 0.010 0.9919
## wdayThu 4.277e-02 7.722e-02 0.554 0.5798
## wdayTue -2.025e-02 7.721e-02 -0.262 0.7932
## wdayWed 1.785e-01 7.753e-02 2.303 0.0216 *
## hourly_cloud_average -2.360e-03 9.317e-04 -2.533 0.0115 *
## is.ramadan -7.392e-02 1.034e-01 -0.715 0.4750
## special_period -7.734e-01 7.800e-02 -9.915 <2e-16 ***
## is.religousday 1.333e-01 1.297e-01 1.027 0.3046
## monthAug 1.157e+01 1.445e+01 0.800 0.4238
## monthDec 8.033e+00 9.868e+00 0.814 0.4159
## monthFeb 1.166e+01 7.449e+00 1.565 0.1180
## monthJan 1.184e+01 7.522e+00 1.573 0.1160
## monthJul 1.323e+01 1.291e+01 1.025 0.3056
## monthJun 2.234e+01 1.765e+01 1.266 0.2060
## monthMar 1.069e+01 7.422e+00 1.441 0.1501
## monthMay 1.781e+01 1.075e+01 1.657 0.0979 .
## monthNov 1.516e+01 1.001e+01 1.515 0.1302
## monthOct 2.115e+01 1.005e+01 2.105 0.0356 *
## monthSep 1.213e+01 1.038e+01 1.169 0.2428
## hourly_max_t 4.981e-02 2.449e-02 2.034 0.0423 *
## monthAug:hourly_max_t -4.072e-02 5.079e-02 -0.802 0.4230
## monthDec:hourly_max_t -3.044e-02 3.587e-02 -0.849 0.3963
## monthFeb:hourly_max_t -4.352e-02 2.701e-02 -1.612 0.1074
## monthJan:hourly_max_t -4.425e-02 2.727e-02 -1.623 0.1051
## monthJul:hourly_max_t -4.523e-02 4.558e-02 -0.992 0.3213
## monthJun:hourly_max_t -7.796e-02 6.211e-02 -1.255 0.2098
## monthMar:hourly_max_t -3.931e-02 2.682e-02 -1.466 0.1430
## monthMay:hourly_max_t -6.496e-02 3.864e-02 -1.681 0.0931 .
## monthNov:hourly_max_t -5.563e-02 3.617e-02 -1.538 0.1245
## monthOct:hourly_max_t -7.557e-02 3.600e-02 -2.099 0.0361 *
## monthSep:hourly_max_t -4.255e-02 3.699e-02 -1.150 0.2504
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5959 on 809 degrees of freedom
## Multiple R-squared: 0.6395, Adjusted R-squared: 0.6239
## F-statistic: 41 on 35 and 809 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_6)
##
## Breusch-Godfrey test for serial correlation of order up to 39
##
## data: Residuals
## LM test = 147.51, df = 39, p-value = 1.595e-14
plot(lm_hour_6)
Coefficients and Significance:
Lag 1 Production: Highly significant with a positive impact, indicating that the production from the previous hour continues to influence the current hour’s production significantly.
USWRF Top of Atmosphere: Shows lower significance, indicating a minimal direct impact on the predictions for this hour.
Wday: Certain days of the week show borderline significance, suggesting potential weekly patterns in production.
Hourly Cloud Average: Significant and negatively impacting, indicating that higher cloud coverage reduces production.
Is Ramadan: Borderline significant and negatively impacting, suggesting lower production during Ramadan.
Special Period: Highly significant and negatively impacting, indicating lower production during this period.
Monthly Effects: Some months showed significant
interactions with hourly_max_t, highlighting seasonal
variations in production.
Residuals and Diagnostics:
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6422, suggesting that the model explains about 64.22% of the variability in production.
Adjusted R-squared: 0.6271, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
he WMAPE for hour 6 was found to be 66.29%, indicating a reasonably accurate model.
Hourly Production Data for Hour 6:
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
For hour 7, the linear regression model included parameters such as
lag_1_production, trend_hour_7,
special_period, dlwrf_surface,
hourly_cloud_average, month, and
hourly_max_t.
# Plot production for hour 7
ggplot(hour_7_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 7",
x = "Date",
y = "Production") +
theme_minimal()
library(data.table)
# Filter data for hour 7 and remove the hour column
hour_7_data <- all_data[all_data$hour == 7, ]
hour_7_data <- hour_7_data[, -c(2)]
# Create a trend variable for hour 7
hour_7_data$trend_hour_7 <- 1:nrow(hour_7_data)
# Convert the data frame to a data.table
setDT(hour_7_data)
# Create a lagged variable for production with a lag of 1 period
hour_7_data[, lag_1_production := shift(production,1)]
hour_7_data[,lag_1_diff:=production-lag_1_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_7_data <- hour_7_data[!is.na(lag_1_production)]
# Fit linear regression model for hour 7 including the lagged variable
lm_hour_7 <- lm(production ~+lag_1_production+trend_hour_7 + special_period + DLWRF_surface + hourly_cloud_average + month+hourly_max_t,data=hour_7_data)
summary(lm_hour_7)
##
## Call:
## lm(formula = production ~ +lag_1_production + trend_hour_7 +
## special_period + DLWRF_surface + hourly_cloud_average + month +
## hourly_max_t, data = hour_7_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4967 -0.6421 0.0727 0.7645 8.9775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.746e+01 5.775e+00 -3.024 0.002573 **
## lag_1_production 3.037e-01 3.142e-02 9.665 < 2e-16 ***
## trend_hour_7 4.775e-04 2.486e-04 1.921 0.055064 .
## special_period -6.060e-01 1.710e-01 -3.543 0.000418 ***
## DLWRF_surface -1.576e-02 4.230e-03 -3.725 0.000209 ***
## hourly_cloud_average -1.027e-02 3.479e-03 -2.951 0.003256 **
## monthAug -6.492e-01 3.159e-01 -2.055 0.040195 *
## monthDec -2.015e+00 2.810e-01 -7.171 1.65e-12 ***
## monthFeb -1.207e+00 2.819e-01 -4.280 2.09e-05 ***
## monthJan -2.004e+00 2.745e-01 -7.302 6.66e-13 ***
## monthJul 3.540e-01 2.999e-01 1.180 0.238195
## monthJun 4.946e-01 2.870e-01 1.724 0.085146 .
## monthMar -4.739e-01 2.467e-01 -1.921 0.055093 .
## monthMay -7.938e-02 2.426e-01 -0.327 0.743572
## monthNov -1.088e+00 2.643e-01 -4.118 4.20e-05 ***
## monthOct 1.606e-01 2.652e-01 0.605 0.545018
## monthSep 1.085e-01 2.786e-01 0.390 0.696995
## hourly_max_t 8.783e-02 2.385e-02 3.683 0.000245 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.366 on 827 degrees of freedom
## Multiple R-squared: 0.6479, Adjusted R-squared: 0.6407
## F-statistic: 89.52 on 17 and 827 DF, p-value: < 2.2e-16
# Check residuals
library(forecast)
checkresiduals(lm_hour_7)
##
## Breusch-Godfrey test for serial correlation of order up to 21
##
## data: Residuals
## LM test = 57.848, df = 21, p-value = 2.689e-05
# Plot the model
plot(lm_hour_7)
Coefficients and Significance:
Lag 1 Production: Highly significant with a positive impact, indicating that the production from the previous hour continues to influence the current hour’s production significantly.
Trend Hour 7: Significant and positively impacting, suggesting a gradual increase in production over time.
Special Period: Highly significant and negatively impacting, indicating lower production during this period.
DLWRF Surface: Significant and negatively impacting, indicating that downward longwave radiation reduces production.
Hourly Cloud Average: Significant and negatively impacting, indicating that higher cloud coverage reduces production.
Monthly Effects: Several months showed significant effects, highlighting seasonal variations in production.
Hourly Max T: Significant and positively impacting, indicating that higher temperatures during specific months can increase production.
Residuals and Diagnostics:
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6439, suggesting that the model explains about 64.39% of the variability in production.
Adjusted R-squared: 0.6367, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
he WMAPE for hour 7 was found to be 29.76%, indicating a reasonably accurate model.
Hourly Production Data for Hour 7:
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
For hour 8, the linear regression model included significant predictors such as lag_1_production, trend_hour_8, csnow_surface, dlwrf_surface, hourly_cloud_average, is.ramadan, and interactions between month and hourly_max_t.
For hour 8, although there are still higher instances of lags in the residual ACF plot, no additional lag parameter was added to prevent multicollinearity.
Similar assumption was also followed with other hours.
# Plot production for hour 8
ggplot(hour_8_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 8",
x = "Date",
y = "Production") +
theme_minimal()
# Filter data for hour 8
hour_8_data <- all_data[all_data$hour == 8, ]
hour_8_data <- hour_8_data[,-c(2)]
hour_8_data$trend_hour_8 <- 1:nrow(hour_8_data)
# Create a lagged variable for production with a lag of 1 period
hour_8_data[, lag_1_production := shift(production,1)]
hour_8_data[,lag_1_diff:=production-lag_1_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_8_data <- hour_8_data[!is.na(lag_1_production)]
#Fit linear regression model for hour 8
lm_hour_8 <- lm(production ~ +lag_1_production+trend_hour_8+ CSNOW_surface+DLWRF_surface+hourly_cloud_average+is.ramadan+month*hourly_max_t, data = hour_8_data)
# Summarize the model
summary(lm_hour_8)
##
## Call:
## lm(formula = production ~ +lag_1_production + trend_hour_8 +
## CSNOW_surface + DLWRF_surface + hourly_cloud_average + is.ramadan +
## month * hourly_max_t, data = hour_8_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7023 -0.9553 0.1940 1.0672 4.7857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.779e+00 1.668e+01 -0.227 0.820830
## lag_1_production 1.923e-01 2.934e-02 6.552 1.00e-10 ***
## trend_hour_8 1.017e-03 2.879e-04 3.532 0.000436 ***
## CSNOW_surface -1.251e+00 4.430e-01 -2.823 0.004871 **
## DLWRF_surface -2.500e-02 5.239e-03 -4.771 2.17e-06 ***
## hourly_cloud_average -2.636e-02 4.731e-03 -5.572 3.42e-08 ***
## is.ramadan -4.809e-01 3.227e-01 -1.490 0.136525
## monthAug 9.521e+00 3.833e+01 0.248 0.803901
## monthDec -5.832e+01 2.807e+01 -2.078 0.038046 *
## monthFeb -2.835e+01 1.920e+01 -1.476 0.140240
## monthJan -3.938e+01 1.973e+01 -1.996 0.046273 *
## monthJul -2.735e+00 3.322e+01 -0.082 0.934418
## monthJun -2.855e+00 3.410e+01 -0.084 0.933309
## monthMar -2.354e+01 1.966e+01 -1.197 0.231663
## monthMay 3.804e+00 2.449e+01 0.155 0.876610
## monthNov -1.300e+01 2.601e+01 -0.500 0.617249
## monthOct -1.184e+01 2.713e+01 -0.437 0.662492
## monthSep 3.274e+01 2.576e+01 1.271 0.204133
## hourly_max_t 5.661e-02 5.988e-02 0.945 0.344725
## monthAug:hourly_max_t -3.199e-02 1.297e-01 -0.247 0.805149
## monthDec:hourly_max_t 2.065e-01 1.016e-01 2.033 0.042423 *
## monthFeb:hourly_max_t 1.007e-01 6.877e-02 1.464 0.143456
## monthJan:hourly_max_t 1.372e-01 7.077e-02 1.938 0.052980 .
## monthJul:hourly_max_t 1.088e-02 1.137e-01 0.096 0.923779
## monthJun:hourly_max_t 1.093e-02 1.173e-01 0.093 0.925749
## monthMar:hourly_max_t 8.580e-02 7.003e-02 1.225 0.220891
## monthMay:hourly_max_t -1.244e-02 8.562e-02 -0.145 0.884487
## monthNov:hourly_max_t 4.300e-02 9.308e-02 0.462 0.644230
## monthOct:hourly_max_t 4.253e-02 9.566e-02 0.445 0.656739
## monthSep:hourly_max_t -1.123e-01 8.950e-02 -1.254 0.210115
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.798 on 815 degrees of freedom
## Multiple R-squared: 0.5968, Adjusted R-squared: 0.5825
## F-statistic: 41.6 on 29 and 815 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_8)
##
## Breusch-Godfrey test for serial correlation of order up to 33
##
## data: Residuals
## LM test = 66.346, df = 33, p-value = 0.000508
plot(lm_hour_8)
Lag 1 Production: Highly significant with a positive impact, indicating the previous hour’s production significantly influences the current hour’s production.
Trend Hour 8: Significant with a positive impact, suggesting a gradual increase in production over time.
Csnow Surface: Significant with a negative impact, indicating snowfall negatively affects production.
DLWRF Surface: Highly significant with a negative impact, indicating that downward longwave radiation at the surface negatively affects production.
Hourly Cloud Average: Significant with a negative impact, indicating cloud cover reduces production.
Is Ramadan: Significant with a negative impact, suggesting lower production during Ramadan.
Monthly Effects:
Residuals and Diagnostics:
Residual Standard Error: Indicates the variability in residuals or prediction errors.
Multiple R-squared: 0.5892, suggesting that the model explains about 58.92% of the variability in production.
Adjusted R-squared: 0.5749, slightly lower than Multiple R-squared, accounting for the number of predictors.
F-statistic: Significant, indicating the model provides a better fit than a model with no predictors.
The WMAPE for hour 8 was found to be 22.69%, indicating a reasonably accurate model.
Hourly Production Data for Hour 8:
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with deviations, indicating areas where predictions might be off.
For hour 9, a linear regression model was created, including variables such as lag_1_production, special_period, dswrf_surface, csnow_surface, dlwrf_surface, is.nationalday, uswrf_surface, hourly_cloud_average, and interactions between month and hourly_max_t.
# Plot production for hour 9
ggplot(hour_9_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 9",
x = "Date",
y = "Production") +
theme_minimal()
# Filter data for hour 9
hour_9_data <- all_data[all_data$hour == 9, ]
hour_9_data <- hour_9_data[,-c(2)]
hour_9_data$trend_hour_9 <- 1:nrow(hour_9_data)
hour_9_data[, lag_1_production := shift(production,1)]
hour_9_data[,lag_1_diff:=production-lag_1_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_9_data <- hour_9_data[!is.na(lag_1_production)]
# Fit linear regression model for hour 9
lm_hour_9 <- lm(production ~+lag_1_production+special_period+DSWRF_surface+CSNOW_surface+DLWRF_surface+is.nationalday+USWRF_surface+hourly_cloud_average+month*hourly_max_t, data = hour_9_data)
# Summarize the model
summary(lm_hour_9)
##
## Call:
## lm(formula = production ~ +lag_1_production + special_period +
## DSWRF_surface + CSNOW_surface + DLWRF_surface + is.nationalday +
## USWRF_surface + hourly_cloud_average + month * hourly_max_t,
## data = hour_9_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4226 -0.9775 0.2714 1.1875 5.3309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.320259 19.054706 -1.749 0.080727 .
## lag_1_production 0.103092 0.028440 3.625 0.000307 ***
## special_period -0.312502 0.211754 -1.476 0.140391
## DSWRF_surface -0.033438 0.015008 -2.228 0.026153 *
## CSNOW_surface -1.403326 0.521675 -2.690 0.007291 **
## DLWRF_surface -0.035314 0.006185 -5.710 1.58e-08 ***
## is.nationalday 1.304794 0.542390 2.406 0.016367 *
## USWRF_surface 0.097334 0.044588 2.183 0.029322 *
## hourly_cloud_average -0.030182 0.005345 -5.647 2.26e-08 ***
## monthAug 9.616202 41.496349 0.232 0.816801
## monthDec -99.793066 32.001599 -3.118 0.001883 **
## monthFeb -13.875015 19.317473 -0.718 0.472802
## monthJan -69.201192 20.682438 -3.346 0.000858 ***
## monthJul 31.871323 32.065862 0.994 0.320551
## monthJun -15.514790 32.084706 -0.484 0.628830
## monthMar -28.662861 20.496925 -1.398 0.162375
## monthMay -18.575056 23.384689 -0.794 0.427238
## monthNov -4.709414 24.821347 -0.190 0.849566
## monthOct -28.069339 29.292627 -0.958 0.338227
## monthSep 32.201308 25.609441 1.257 0.208970
## hourly_max_t 0.179545 0.069815 2.572 0.010296 *
## monthAug:hourly_max_t -0.035523 0.137363 -0.259 0.796001
## monthDec:hourly_max_t 0.360259 0.114939 3.134 0.001784 **
## monthFeb:hourly_max_t 0.052070 0.068715 0.758 0.448810
## monthJan:hourly_max_t 0.250637 0.073880 3.393 0.000726 ***
## monthJul:hourly_max_t -0.104955 0.107968 -0.972 0.331293
## monthJun:hourly_max_t 0.054977 0.108866 0.505 0.613698
## monthMar:hourly_max_t 0.104192 0.072343 1.440 0.150179
## monthMay:hourly_max_t 0.064481 0.080595 0.800 0.423909
## monthNov:hourly_max_t 0.015062 0.087706 0.172 0.863695
## monthOct:hourly_max_t 0.097286 0.101629 0.957 0.338716
## monthSep:hourly_max_t -0.109442 0.087587 -1.250 0.211834
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.966 on 813 degrees of freedom
## Multiple R-squared: 0.5732, Adjusted R-squared: 0.5569
## F-statistic: 35.22 on 31 and 813 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_9)
##
## Breusch-Godfrey test for serial correlation of order up to 35
##
## data: Residuals
## LM test = 25.506, df = 35, p-value = 0.8801
plot(lm_hour_9)
Coefficients and Significance:
Lag 1 Production: Significant with a positive impact, indicating that the production from the previous hour significantly influences the current hour’s production.
Special Period: Not significant, suggesting minimal direct impact on the predictions for this hour.
DSWRF Surface: Significant and negatively impacting, indicating lower production with higher downward shortwave radiation at the surface.
CSnow Surface: Significant and negatively impacting, suggesting a decrease in production with increasing snow cover.
DLWRF Surface: Highly significant and negatively impacting, indicating that downward longwave radiation negatively affects production.
Is National Day: Marginally significant and positively impacting, suggesting slightly higher production on national days.
USWRF Surface: Significant and positively impacting, indicating higher production with increasing upward shortwave radiation at the surface.
Hourly Cloud Average: Significant and negatively impacting, suggesting lower production with higher cloud coverage.
Monthly Effects:
Some months showed significant interactions with hourly_max_t, highlighting seasonal variations in production. For example, December (Ara) and January (Oca) had notable interactions with temperature, impacting production levels.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.5639, suggesting that the model explains about 56.39% of the variability in production.
Adjusted R-squared: 0.5476, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The WMAPE for hour 9 was found to be 19.51%, indicating a reasonably accurate model.
Hourly Production Data for Hour 9: The production data for hour 9 is well-populated, showing clearer patterns as the day progresses.
Residuals Analysis:
Top Plot (Residuals over time): Displays the residuals over time, indicating periods of higher residuals and less accurate model predictions.
ACF Plot (Autocorrelation of Residuals): Shows some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
For hour 10, the linear regression model included parameters such as lag_1_production, csnow_surface, dlwrf_surface, hourly_cloud_average, is.weekend, is.religousday, is.nationalday, and interactions between month and hourly_max_t.
# Plot production for hour 10
ggplot(hour_10_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 10",
x = "Date",
y = "Production") +
theme_minimal()
# Create a trend variable for hour 10
hour_10_data <- all_data[all_data$hour == 10, ]
hour_10_data <- hour_10_data[,-c(2)]
hour_10_data$trend_hour_10 <- 1:nrow(hour_10_data)
hour_10_data[, lag_1_production := shift(production,1)]
hour_10_data[,lag_1_diff:=production-lag_1_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_10_data <- hour_10_data[!is.na(lag_1_production)]
# Fit linear regression model for hour 10
lm_hour_10 <- lm(production ~+lag_1_production +CSNOW_surface+DLWRF_surface+hourly_cloud_average+is.weekend+is.religousday+is.nationalday+month*hourly_max_t, data = hour_10_data)
summary(lm_hour_10)
##
## Call:
## lm(formula = production ~ +lag_1_production + CSNOW_surface +
## DLWRF_surface + hourly_cloud_average + is.weekend + is.religousday +
## is.nationalday + month * hourly_max_t, data = hour_10_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9511 -0.6960 0.2747 1.0427 5.3855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.917e+01 1.426e+01 -1.344 0.179270
## lag_1_production 8.965e-02 2.806e-02 3.195 0.001454 **
## CSNOW_surface -2.209e+00 5.742e-01 -3.847 0.000129 ***
## DLWRF_surface -3.109e-02 4.798e-03 -6.481 1.58e-10 ***
## hourly_cloud_average -2.250e-02 5.252e-03 -4.284 2.06e-05 ***
## is.weekend -1.388e-01 1.564e-01 -0.887 0.375188
## is.religousday 8.657e-01 4.385e-01 1.974 0.048696 *
## is.nationalday 8.857e-01 5.634e-01 1.572 0.116309
## monthAug -7.828e+00 4.126e+01 -0.190 0.849571
## monthDec -1.390e+02 3.096e+01 -4.490 8.16e-06 ***
## monthFeb 2.690e+00 1.823e+01 0.148 0.882754
## monthJan -8.599e+01 1.956e+01 -4.397 1.25e-05 ***
## monthJul 1.212e+01 2.824e+01 0.429 0.667890
## monthJun -4.578e+01 2.890e+01 -1.584 0.113538
## monthMar 9.752e+00 1.747e+01 0.558 0.576945
## monthMay -1.785e+01 2.130e+01 -0.838 0.402088
## monthNov -1.706e+01 2.146e+01 -0.795 0.426902
## monthOct -2.924e+01 2.730e+01 -1.071 0.284492
## monthSep -4.112e+00 2.326e+01 -0.177 0.859707
## hourly_max_t 1.226e-01 4.964e-02 2.469 0.013744 *
## monthAug:hourly_max_t 2.603e-02 1.339e-01 0.194 0.845904
## monthDec:hourly_max_t 4.979e-01 1.099e-01 4.531 6.75e-06 ***
## monthFeb:hourly_max_t -6.664e-03 6.404e-02 -0.104 0.917149
## monthJan:hourly_max_t 3.105e-01 6.912e-02 4.493 8.05e-06 ***
## monthJul:hourly_max_t -3.802e-02 9.375e-02 -0.406 0.685142
## monthJun:hourly_max_t 1.533e-01 9.689e-02 1.582 0.113938
## monthMar:hourly_max_t -3.196e-02 6.076e-02 -0.526 0.598998
## monthMay:hourly_max_t 5.950e-02 7.247e-02 0.821 0.411856
## monthNov:hourly_max_t 5.964e-02 7.458e-02 0.800 0.424128
## monthOct:hourly_max_t 1.012e-01 9.313e-02 1.087 0.277522
## monthSep:hourly_max_t 1.512e-02 7.789e-02 0.194 0.846119
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.04 on 814 degrees of freedom
## Multiple R-squared: 0.5499, Adjusted R-squared: 0.5333
## F-statistic: 33.15 on 30 and 814 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_10)
##
## Breusch-Godfrey test for serial correlation of order up to 34
##
## data: Residuals
## LM test = 66.297, df = 34, p-value = 0.0007542
plot(lm_hour_10)
pacf(hour_10_data$production)
Lag 1 Production: Significant with a positive impact, indicating that the previous hour’s production continues to influence the current hour.
Csnow Surface: Highly significant and negatively impacting, indicating reduced production during snowy conditions.
DLWRF Surface: Highly significant and negatively impacting, indicating reduced production with higher downward longwave radiation.
Hourly Cloud Average: Highly significant and negatively impacting, suggesting lower production with increased cloud cover.
Is Weekend: Not highly significant, indicating minimal impact during weekends.
Is Religious Day: Marginally significant with a positive impact, suggesting slightly higher production during religious holidays.
Is National Day: Not highly significant, indicating minimal impact during national holidays.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.5422, suggesting that the model explains about 54.22% of the variability in production.
Adjusted R-squared: 0.5257, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The WMAPE for hour 10 was found to be 17.13%, indicating a reasonably accurate model.
Hourly Production Data for Hour 10: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
# Plot production for hour 11
ggplot(hour_11_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 11",
x = "Date",
y = "Production") +
theme_minimal()
# Create a trend variable for hour 11
hour_11_data <- all_data[all_data$hour == 11, ]
hour_11_data <- hour_11_data[,-c(2)]
hour_11_data$trend_hour_11 <- 1:nrow(hour_11_data)
hour_11_data[, lag_19_production := shift(production,19)]
hour_11_data[,lag_19_diff:=production-lag_19_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_11_data <- hour_11_data[!is.na(lag_19_production)]
# Fit linear regression model for hour 11
lm_hour_11 <- lm(production ~+lag_19_production+trend_hour_11+special_period+CSNOW_surface+DLWRF_surface+TMP_surface+hourly_cloud_average+is.weekend+is.religousday+is.nationalday+month*hourly_max_t, data = hour_11_data)
summary(lm_hour_11)
##
## Call:
## lm(formula = production ~ +lag_19_production + trend_hour_11 +
## special_period + CSNOW_surface + DLWRF_surface + TMP_surface +
## hourly_cloud_average + is.weekend + is.religousday + is.nationalday +
## month * hourly_max_t, data = hour_11_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8293 -0.6438 0.2980 1.0601 6.7454
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.648e+01 1.281e+01 -2.068 0.038998 *
## lag_19_production -6.935e-02 2.616e-02 -2.651 0.008194 **
## trend_hour_11 -7.087e-04 4.135e-04 -1.714 0.086916 .
## special_period 3.271e-01 2.628e-01 1.245 0.213632
## CSNOW_surface -1.956e+00 5.801e-01 -3.373 0.000781 ***
## DLWRF_surface -3.168e-02 4.589e-03 -6.903 1.04e-11 ***
## TMP_surface 1.518e-01 4.405e-02 3.446 0.000599 ***
## hourly_cloud_average -1.728e-02 5.404e-03 -3.197 0.001444 **
## is.weekend -1.295e-01 1.572e-01 -0.823 0.410494
## is.religousday 1.882e-01 4.448e-01 0.423 0.672330
## is.nationalday 3.911e-01 5.597e-01 0.699 0.484986
## monthAug -4.955e+01 4.017e+01 -1.233 0.217756
## monthDec -1.214e+02 2.656e+01 -4.570 5.66e-06 ***
## monthFeb 2.564e+01 1.699e+01 1.509 0.131646
## monthJan -8.820e+01 1.909e+01 -4.620 4.48e-06 ***
## monthJul -8.063e+00 2.623e+01 -0.307 0.758577
## monthJun -1.023e+01 2.567e+01 -0.398 0.690463
## monthMar 1.661e+01 1.596e+01 1.041 0.298382
## monthMay -3.758e+00 1.952e+01 -0.192 0.847411
## monthNov -3.104e+01 1.901e+01 -1.633 0.102942
## monthOct -1.058e+01 2.549e+01 -0.415 0.678376
## monthSep -1.605e+01 2.146e+01 -0.748 0.454965
## hourly_max_t NA NA NA NA
## monthAug:hourly_max_t 1.578e-01 1.285e-01 1.228 0.219990
## monthDec:hourly_max_t 4.319e-01 9.323e-02 4.633 4.22e-06 ***
## monthFeb:hourly_max_t -8.750e-02 5.924e-02 -1.477 0.140078
## monthJan:hourly_max_t 3.162e-01 6.705e-02 4.715 2.85e-06 ***
## monthJul:hourly_max_t 2.752e-02 8.608e-02 0.320 0.749326
## monthJun:hourly_max_t 3.347e-02 8.539e-02 0.392 0.695214
## monthMar:hourly_max_t -5.515e-02 5.503e-02 -1.002 0.316592
## monthMay:hourly_max_t 1.027e-02 6.580e-02 0.156 0.876002
## monthNov:hourly_max_t 1.084e-01 6.541e-02 1.657 0.097897 .
## monthOct:hourly_max_t 3.737e-02 8.590e-02 0.435 0.663686
## monthSep:hourly_max_t 5.351e-02 7.098e-02 0.754 0.451115
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.03 on 794 degrees of freedom
## Multiple R-squared: 0.5389, Adjusted R-squared: 0.5204
## F-statistic: 29 on 32 and 794 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_11)
##
## Breusch-Godfrey test for serial correlation of order up to 37
##
## data: Residuals
## LM test = 60.556, df = 37, p-value = 0.008602
plot(lm_hour_11)
Model Summary:
Lag 19 Production: Significant with a negative impact, indicating that the production 19 hours prior has a noticeable influence on the current hour.
Trend Hour 11: Not significant, suggesting no clear upward or downward trend in production over time for this hour.
Special Period: Not significant, indicating minimal impact of special periods on production during hour 11.
Csnow Surface: Highly significant and negatively impacting, indicating reduced production during snowy conditions.
DLWRF Surface: Highly significant and negatively impacting, indicating reduced production with higher downward longwave radiation.
TMP Surface: Significant with a positive impact, suggesting higher production with increased surface temperature.
Hourly Cloud Average: Significant and negatively impacting, indicating lower production with increased cloud cover.
Is Weekend: Not significant, indicating minimal impact during weekends.
Is Religious Day: Not significant, indicating minimal impact during religious holidays.
Is National Day: Not significant, indicating minimal impact during national holidays.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.5386, suggesting that the model explains about 53.86% of the variability in production.
Adjusted R-squared: 0.5204, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
Hourly Production Data for Hour 11: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
ggplot(hour_12_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 12",
x = "Date",
y = "Production") +
theme_minimal()
# Create a trend variable for hour 12
hour_12_data <- all_data[all_data$hour == 12, ]
hour_12_data <- hour_12_data[,-c(2)]
hour_12_data[, lag_1_production := shift(production,1)]
hour_12_data[,lag_1_diff:=production-lag_1_production]
hour_12_data <- hour_12_data[!is.na(lag_1_production)]
hour_12_data$trend_hour_12 <- 1:nrow(hour_12_data)
lm_hour_12 <- lm(production ~ +lag_1_production+trend_hour_12+CSNOW_surface+DLWRF_surface+TMP_surface+hourly_cloud_average+is.weekend+is.religousday+is.nationalday+month*hourly_max_t , data = hour_12_data)
summary(lm_hour_12)
##
## Call:
## lm(formula = production ~ +lag_1_production + trend_hour_12 +
## CSNOW_surface + DLWRF_surface + TMP_surface + hourly_cloud_average +
## is.weekend + is.religousday + is.nationalday + month * hourly_max_t,
## data = hour_12_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4179 -0.7232 0.2480 1.1310 6.5360
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.044e+01 1.229e+01 -1.663 0.096784 .
## lag_1_production 9.585e-02 2.853e-02 3.360 0.000816 ***
## trend_hour_12 -6.164e-04 3.318e-04 -1.858 0.063583 .
## CSNOW_surface -1.382e+00 5.373e-01 -2.572 0.010293 *
## DLWRF_surface -3.028e-02 4.373e-03 -6.925 8.87e-12 ***
## TMP_surface 1.248e-01 4.206e-02 2.967 0.003100 **
## hourly_cloud_average -1.783e-02 5.322e-03 -3.349 0.000847 ***
## is.weekend -2.725e-01 1.539e-01 -1.771 0.076955 .
## is.religousday 1.557e-01 4.376e-01 0.356 0.722122
## is.nationalday 4.066e-01 5.543e-01 0.734 0.463386
## monthAug -2.198e+01 4.001e+01 -0.549 0.582889
## monthDec -6.447e+01 2.382e+01 -2.707 0.006934 **
## monthFeb 2.736e+01 1.575e+01 1.737 0.082741 .
## monthJan -6.111e+01 1.709e+01 -3.575 0.000371 ***
## monthJul 1.667e+01 2.441e+01 0.683 0.494851
## monthJun -5.513e+00 2.362e+01 -0.233 0.815540
## monthMar 9.252e+00 1.488e+01 0.622 0.534191
## monthMay 1.227e+01 1.842e+01 0.666 0.505469
## monthNov -2.384e+01 1.779e+01 -1.340 0.180513
## monthOct -1.661e+01 2.434e+01 -0.682 0.495177
## monthSep -1.128e+01 2.041e+01 -0.553 0.580642
## hourly_max_t NA NA NA NA
## monthAug:hourly_max_t 7.162e-02 1.269e-01 0.564 0.572701
## monthDec:hourly_max_t 2.275e-01 8.293e-02 2.743 0.006220 **
## monthFeb:hourly_max_t -9.554e-02 5.447e-02 -1.754 0.079774 .
## monthJan:hourly_max_t 2.164e-01 5.950e-02 3.637 0.000293 ***
## monthJul:hourly_max_t -5.148e-02 7.957e-02 -0.647 0.517835
## monthJun:hourly_max_t 1.842e-02 7.820e-02 0.236 0.813818
## monthMar:hourly_max_t -2.977e-02 5.089e-02 -0.585 0.558761
## monthMay:hourly_max_t -4.109e-02 6.177e-02 -0.665 0.506079
## monthNov:hourly_max_t 8.153e-02 6.071e-02 1.343 0.179644
## monthOct:hourly_max_t 5.646e-02 8.141e-02 0.694 0.488148
## monthSep:hourly_max_t 3.811e-02 6.697e-02 0.569 0.569426
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.005 on 813 degrees of freedom
## Multiple R-squared: 0.5433, Adjusted R-squared: 0.5258
## F-statistic: 31.19 on 31 and 813 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_12)
##
## Breusch-Godfrey test for serial correlation of order up to 36
##
## data: Residuals
## LM test = 37.429, df = 36, p-value = 0.4034
plot(lm_hour_12)
Lag 1 Production: Significant with a positive impact, indicating that the production 1 hour prior has a noticeable influence on the current hour.
Trend Hour 12: Significant with a negative impact, suggesting a slight downward trend in production over time for this hour.
Csnow Surface: Significant with a negative impact, indicating reduced production during snowy conditions.
DLWRF Surface: Highly significant and negatively impacting, indicating reduced production with higher downward longwave radiation.
TMP Surface: Not significant, suggesting minimal impact of surface temperature on production during hour 12.
Hourly Cloud Average: Significant and negatively impacting, indicating lower production with increased cloud cover.
Is Weekend: Significant with a negative impact, indicating reduced production during weekends.
Is Religious Day: Not significant, indicating minimal impact during religious holidays.
Is National Day: Not significant, indicating minimal impact during national holidays.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.5443, suggesting that the model explains about 54.43% of the variability in production.
Adjusted R-squared: 0.5258, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
Visualization and Interpretation:
Hourly Production Data for Hour 12: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
ggplot(hour_13_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 13",
x = "Date",
y = "Production") +
theme_minimal()
# Create a trend variable for hour 13
hour_13_data <- all_data[all_data$hour == 13, ]
hour_13_data <- hour_13_data[,-c(2)]
hour_13_data[, lag_1_production := shift(production,1)]
hour_13_data[,lag_1_diff:=production-lag_1_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_13_data <- hour_13_data[!is.na(lag_1_production)]
# Remove rows with missing values in the hourly_max_t column
hour_13_data$trend_hour_13 <- 1:nrow(hour_13_data)
lm_hour_13 <- lm(production ~+lag_1_production +trend_hour_13+DLWRF_surface+TMP_surface+hourly_cloud_average+is.weekend+is.nationalday+is.publicholiday+month * hourly_max_t , data = hour_13_data)
summary(lm_hour_13)
##
## Call:
## lm(formula = production ~ +lag_1_production + trend_hour_13 +
## DLWRF_surface + TMP_surface + hourly_cloud_average + is.weekend +
## is.nationalday + is.publicholiday + month * hourly_max_t,
## data = hour_13_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.7359 -1.0069 0.3289 1.2389 6.2663
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.231e+01 1.287e+01 -2.512 0.012212 *
## lag_1_production 9.720e-02 2.816e-02 3.451 0.000587 ***
## trend_hour_13 -6.402e-04 3.405e-04 -1.880 0.060486 .
## DLWRF_surface -3.333e-02 4.420e-03 -7.539 1.26e-13 ***
## TMP_surface 1.659e-01 4.380e-02 3.789 0.000163 ***
## hourly_cloud_average -2.490e-02 5.626e-03 -4.426 1.09e-05 ***
## is.weekend 3.515e+00 1.153e+00 3.049 0.002370 **
## is.nationalday 1.938e+00 7.691e-01 2.520 0.011932 *
## is.publicholiday -3.789e+00 1.164e+00 -3.255 0.001180 **
## monthAug -5.345e+01 4.184e+01 -1.278 0.201715
## monthDec -3.358e+01 2.374e+01 -1.415 0.157499
## monthFeb 2.762e+01 1.617e+01 1.709 0.087897 .
## monthJan -4.076e+01 1.753e+01 -2.325 0.020327 *
## monthJul 2.489e+01 2.358e+01 1.055 0.291520
## monthJun 8.255e+00 2.403e+01 0.344 0.731272
## monthMar 1.101e+01 1.522e+01 0.724 0.469515
## monthMay 2.711e+01 1.873e+01 1.448 0.148040
## monthNov -1.899e+01 1.855e+01 -1.024 0.306284
## monthOct 2.044e+01 2.516e+01 0.813 0.416723
## monthSep 9.972e+00 2.138e+01 0.466 0.641036
## hourly_max_t NA NA NA NA
## monthAug:hourly_max_t 1.688e-01 1.323e-01 1.276 0.202312
## monthDec:hourly_max_t 1.187e-01 8.230e-02 1.443 0.149513
## monthFeb:hourly_max_t -9.251e-02 5.564e-02 -1.663 0.096762 .
## monthJan:hourly_max_t 1.461e-01 6.073e-02 2.405 0.016393 *
## monthJul:hourly_max_t -7.784e-02 7.678e-02 -1.014 0.310941
## monthJun:hourly_max_t -2.549e-02 7.948e-02 -0.321 0.748495
## monthMar:hourly_max_t -3.392e-02 5.183e-02 -0.655 0.512972
## monthMay:hourly_max_t -8.854e-02 6.274e-02 -1.411 0.158518
## monthNov:hourly_max_t 6.438e-02 6.312e-02 1.020 0.308030
## monthOct:hourly_max_t -6.743e-02 8.395e-02 -0.803 0.422092
## monthSep:hourly_max_t -3.043e-02 6.992e-02 -0.435 0.663570
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.096 on 814 degrees of freedom
## Multiple R-squared: 0.5608, Adjusted R-squared: 0.5447
## F-statistic: 34.65 on 30 and 814 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_13)
##
## Breusch-Godfrey test for serial correlation of order up to 35
##
## data: Residuals
## LM test = 57.215, df = 35, p-value = 0.0103
plot(lm_hour_13)
Lag 1 Production: Significant with a positive impact, indicating that the production 1 hour prior has a noticeable influence on the current hour.
Trend Hour 13: Marginally significant with a negative impact, suggesting a slight downward trend in production over time for this hour.
DLWRF Surface: Highly significant and negatively impacting, indicating reduced production with higher downward longwave radiation.
TMP Surface: Significant with a positive impact, suggesting higher production with increased surface temperature.
Hourly Cloud Average: Significant and negatively impacting, indicating lower production with increased cloud cover.
Is Weekend: Significant with a positive impact, indicating increased production during weekends.
Is National Day: Significant with a positive impact, indicating higher production during national holidays.
Is Public Holiday: Significant with a negative impact, indicating reduced production during public holidays.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.5576, suggesting that the model explains about 55.76% of the variability in production.
Adjusted R-squared: 0.5417, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The Weighted Mean Absolute Percentage Error (WMAPE) for hour 13 is calculated as 23.93%, performance is considerably accurate.
Hourly Production Data for Hour 13: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
ggplot(hour_14_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 14",
x = "Date",
y = "Production") +
theme_minimal()
hour_14_data <- all_data[all_data$hour == 14, ]
hour_14_data <- hour_14_data[,-c(2)]
hour_14_data$trend_hour_14 <- 1:nrow(hour_14_data)
hour_14_data[, lag_14_production := shift(production,1)]
hour_14_data[,lag_14_diff:=production-lag_14_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_14_data <- hour_14_data[!is.na(lag_14_production)]
lm_hour_14 <- lm(production ~ +lag_14_production+ special_period+DLWRF_surface+TMP_surface+is.weekend+is.ramadan+is.religousday+is.nationalday+is.publicholiday+hourly_cloud_average+month*hourly_max_t , data = hour_14_data)
summary(lm_hour_14)
##
## Call:
## lm(formula = production ~ +lag_14_production + special_period +
## DLWRF_surface + TMP_surface + is.weekend + is.ramadan + is.religousday +
## is.nationalday + is.publicholiday + hourly_cloud_average +
## month * hourly_max_t, data = hour_14_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.488 -1.097 0.228 1.213 6.766
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.732e+01 1.263e+01 -1.371 0.17076
## lag_14_production 1.175e-01 2.881e-02 4.078 4.99e-05 ***
## special_period 3.980e-01 2.180e-01 1.826 0.06827 .
## DLWRF_surface -2.542e-02 4.290e-03 -5.925 4.62e-09 ***
## TMP_surface 1.055e-01 4.318e-02 2.443 0.01478 *
## is.weekend 3.513e+00 1.102e+00 3.187 0.00149 **
## is.ramadan 2.437e-01 3.645e-01 0.669 0.50395
## is.religousday 9.014e-03 4.318e-01 0.021 0.98335
## is.nationalday 1.739e+00 7.352e-01 2.366 0.01824 *
## is.publicholiday -3.673e+00 1.113e+00 -3.300 0.00101 **
## hourly_cloud_average -3.190e-02 5.417e-03 -5.888 5.70e-09 ***
## monthAug -3.026e+01 3.544e+01 -0.854 0.39350
## monthDec -1.221e+01 2.328e+01 -0.524 0.60017
## monthFeb 1.662e+01 1.543e+01 1.077 0.28182
## monthJan -3.522e+01 1.672e+01 -2.107 0.03543 *
## monthJul 2.397e+01 2.228e+01 1.076 0.28232
## monthJun -3.196e+01 2.370e+01 -1.348 0.17788
## monthMar 1.270e+01 1.464e+01 0.867 0.38602
## monthMay 2.647e+01 1.855e+01 1.427 0.15400
## monthNov -6.076e+00 1.820e+01 -0.334 0.73856
## monthOct 2.077e+01 2.454e+01 0.846 0.39762
## monthSep 2.979e-01 2.111e+01 0.014 0.98874
## hourly_max_t NA NA NA NA
## monthAug:hourly_max_t 9.717e-02 1.126e-01 0.863 0.38857
## monthDec:hourly_max_t 3.702e-02 8.086e-02 0.458 0.64717
## monthFeb:hourly_max_t -5.687e-02 5.313e-02 -1.070 0.28475
## monthJan:hourly_max_t 1.226e-01 5.795e-02 2.116 0.03467 *
## monthJul:hourly_max_t -7.447e-02 7.275e-02 -1.024 0.30635
## monthJun:hourly_max_t 1.060e-01 7.868e-02 1.347 0.17844
## monthMar:hourly_max_t -4.131e-02 4.983e-02 -0.829 0.40732
## monthMay:hourly_max_t -8.723e-02 6.239e-02 -1.398 0.16244
## monthNov:hourly_max_t 1.655e-02 6.213e-02 0.266 0.79002
## monthOct:hourly_max_t -7.260e-02 8.215e-02 -0.884 0.37713
## monthSep:hourly_max_t -5.273e-04 6.924e-02 -0.008 0.99393
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.007 on 812 degrees of freedom
## Multiple R-squared: 0.5797, Adjusted R-squared: 0.5632
## F-statistic: 35.01 on 32 and 812 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_14)
##
## Breusch-Godfrey test for serial correlation of order up to 37
##
## data: Residuals
## LM test = 32.917, df = 37, p-value = 0.6609
plot(lm_hour_14)
Lag 14 Production: Significant with a positive impact, indicating that the production 1 hour prior has a noticeable influence on the current hour.
Special Period: Marginally significant with a positive impact, suggesting that special periods might slightly increase production.
DLWRF Surface: Highly significant and negatively impacting, indicating reduced production with higher downward longwave radiation.
TMP Surface: Significant with a positive impact, suggesting higher production with increased surface temperature.
Is Weekend: Significant with a positive impact, indicating increased production during weekends.
Is National Day: Significant with a positive impact, indicating higher production during national holidays.
Is Public Holiday: Significant with a negative impact, indicating reduced production during public holidays.
Hourly Cloud Average: Significant and negatively impacting, indicating lower production with increased cloud cover.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.5738, suggesting that the model explains about 57.38% of the variability in production.
Adjusted R-squared: 0.5574, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The Weighted Mean Absolute Percentage Error (WMAPE) for hour 14 is calculated as 25.24%, considerably well.
Hourly Production Data for Hour 14: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
ggplot(hour_15_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 15",
x = "Date",
y = "Production") +
theme_minimal()
hour_15_data <- all_data[all_data$hour == 15, ]
hour_15_data <- hour_15_data[,-c(2)]
hour_15_data$trend_hour_15 <- 1:nrow(hour_15_data)
hour_15_data[, lag_15_production := shift(production,1)]
hour_15_data[,lag_15_diff:=production-lag_15_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_15_data <- hour_15_data[!is.na(lag_15_production)]
lm_hour_15 <- lm(production ~+lag_15_production+trend_hour_15+special_period+DSWRF_surface+USWRF_top_of_atmosphere+DLWRF_surface+hourly_cloud_average+TMP_surface+is.weekend+is.ramadan+is.publicholiday +month*hourly_max_t , data = hour_15_data)
summary(lm_hour_15)
##
## Call:
## lm(formula = production ~ +lag_15_production + trend_hour_15 +
## special_period + DSWRF_surface + USWRF_top_of_atmosphere +
## DLWRF_surface + hourly_cloud_average + TMP_surface + is.weekend +
## is.ramadan + is.publicholiday + month * hourly_max_t, data = hour_15_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2078 -0.8555 0.0800 0.9873 6.1980
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.692e+01 1.267e+01 -3.705 0.000226 ***
## lag_15_production 1.144e-01 2.990e-02 3.827 0.000140 ***
## trend_hour_15 -7.582e-04 3.227e-04 -2.349 0.019046 *
## special_period 4.856e-01 2.105e-01 2.307 0.021293 *
## DSWRF_surface 3.095e-03 1.333e-03 2.323 0.020444 *
## USWRF_top_of_atmosphere 1.062e-02 2.017e-03 5.268 1.77e-07 ***
## DLWRF_surface -2.499e-02 5.065e-03 -4.933 9.81e-07 ***
## hourly_cloud_average -2.648e-02 4.689e-03 -5.647 2.26e-08 ***
## TMP_surface 1.857e-01 4.515e-02 4.113 4.31e-05 ***
## is.weekend 1.134e+00 6.891e-01 1.645 0.100321
## is.ramadan 3.829e-01 2.979e-01 1.285 0.199111
## is.publicholiday -1.400e+00 6.843e-01 -2.045 0.041138 *
## monthAug 1.587e+01 2.506e+01 0.633 0.526879
## monthDec 2.097e+01 2.065e+01 1.016 0.310038
## monthFeb -2.971e+00 1.339e+01 -0.222 0.824463
## monthJan -2.962e+01 1.485e+01 -1.994 0.046524 *
## monthJul 9.355e+00 1.868e+01 0.501 0.616687
## monthJun -4.056e+01 1.898e+01 -2.137 0.032911 *
## monthMar 1.168e+01 1.288e+01 0.907 0.364749
## monthMay -1.450e+00 1.639e+01 -0.088 0.929560
## monthNov 3.102e+01 1.598e+01 1.942 0.052522 .
## monthOct 2.315e+01 2.085e+01 1.110 0.267157
## monthSep 1.328e+00 1.830e+01 0.073 0.942168
## hourly_max_t NA NA NA NA
## monthAug:hourly_max_t -5.224e-02 8.057e-02 -0.648 0.516936
## monthDec:hourly_max_t -7.230e-02 7.216e-02 -1.002 0.316642
## monthFeb:hourly_max_t 1.553e-02 4.650e-02 0.334 0.738468
## monthJan:hourly_max_t 1.108e-01 5.227e-02 2.120 0.034290 *
## monthJul:hourly_max_t -3.180e-02 6.132e-02 -0.519 0.604176
## monthJun:hourly_max_t 1.338e-01 6.344e-02 2.109 0.035283 *
## monthMar:hourly_max_t -3.846e-02 4.421e-02 -0.870 0.384621
## monthMay:hourly_max_t 6.541e-03 5.544e-02 0.118 0.906109
## monthNov:hourly_max_t -1.078e-01 5.469e-02 -1.971 0.049057 *
## monthOct:hourly_max_t -8.094e-02 7.015e-02 -1.154 0.248912
## monthSep:hourly_max_t -6.903e-03 6.036e-02 -0.114 0.908975
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.636 on 811 degrees of freedom
## Multiple R-squared: 0.6816, Adjusted R-squared: 0.6686
## F-statistic: 52.6 on 33 and 811 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_15)
##
## Breusch-Godfrey test for serial correlation of order up to 38
##
## data: Residuals
## LM test = 56.797, df = 38, p-value = 0.02551
plot(lm_hour_15)
Lag 15 Production: Significant with a positive impact, indicating that the production 15 hours prior has a noticeable influence on the current hour.
Trend Hour 15: Significant with a negative impact, suggesting a slight downward trend over time.
Special Period: Significant with a positive impact, indicating that special periods might slightly increase production.
DSWRF Surface: Significant with a positive impact, suggesting increased production with higher downward shortwave radiation.
USWRF Top of Atmosphere: Highly significant with a positive impact, indicating higher production with increased upward shortwave radiation.
DLWRF Surface: Highly significant with a negative impact, indicating reduced production with higher downward longwave radiation.
Hourly Cloud Average: Highly significant with a negative impact, indicating lower production with increased cloud cover.
TMP Surface: Marginally significant with a positive impact, suggesting higher production with increased surface temperature.
Is Weekend: Marginally significant with a positive impact, indicating increased production during weekends.
Is Ramadan: Marginally significant with a positive impact, suggesting slightly higher production during Ramadan.
Is Public Holiday: Significant with a negative impact, indicating reduced production during public holidays.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6737, suggesting that the model explains about 67.37% of the variability in production.
Adjusted R-squared: 0.6608, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The Weighted Mean Absolute Percentage Error (WMAPE) for hour 15 is calculated as 31.63%, considerably well.
Hourly Production Data for Hour 15: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
ggplot(hour_16_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 16",
x = "Date",
y = "Production") +
theme_minimal()
hour_16_data <- all_data[all_data$hour == 16, ]
hour_16_data <- hour_16_data[,-c(2)]
hour_16_data$trend_hour_16 <- 1:nrow(hour_16_data)
hour_16_data[, lag_16_production := shift(production,1)]
hour_16_data[,lag_16_diff:=production-lag_16_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_16_data <- hour_16_data[!is.na(lag_16_production)]
lm_hour_16 <- lm(production ~+lag_16_production+trend_hour_16+ special_period+DSWRF_surface+USWRF_top_of_atmosphere+hourly_cloud_average+is.ramadan+month*hourly_max_t, data = hour_16_data)
summary(lm_hour_16)
##
## Call:
## lm(formula = production ~ +lag_16_production + trend_hour_16 +
## special_period + DSWRF_surface + USWRF_top_of_atmosphere +
## hourly_cloud_average + is.ramadan + month * hourly_max_t,
## data = hour_16_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2672 -0.6417 -0.0248 0.5744 8.0767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.642e+01 9.709e+00 -1.691 0.09127 .
## lag_16_production 2.812e-01 3.206e-02 8.771 < 2e-16 ***
## trend_hour_16 -7.781e-04 2.578e-04 -3.018 0.00263 **
## special_period 5.210e-01 1.706e-01 3.053 0.00234 **
## DSWRF_surface 5.978e-03 1.043e-03 5.732 1.4e-08 ***
## USWRF_top_of_atmosphere 5.059e-03 1.644e-03 3.078 0.00216 **
## hourly_cloud_average -7.726e-03 3.161e-03 -2.445 0.01471 *
## is.ramadan 4.665e-01 2.410e-01 1.936 0.05323 .
## monthAug 1.889e+01 1.999e+01 0.945 0.34481
## monthDec 7.248e+00 1.833e+01 0.395 0.69268
## monthFeb -6.156e+00 1.149e+01 -0.536 0.59220
## monthJan -1.415e+01 1.269e+01 -1.115 0.26523
## monthJul -1.221e+00 1.635e+01 -0.075 0.94048
## monthJun -1.114e+01 1.600e+01 -0.696 0.48650
## monthMar 7.008e+00 1.108e+01 0.633 0.52723
## monthMay 1.781e+01 1.450e+01 1.228 0.21965
## monthNov 3.108e+01 1.427e+01 2.178 0.02968 *
## monthOct 3.476e+01 1.779e+01 1.954 0.05108 .
## monthSep 5.575e+00 1.619e+01 0.344 0.73060
## hourly_max_t 4.936e-02 3.289e-02 1.501 0.13378
## monthAug:hourly_max_t -6.132e-02 6.497e-02 -0.944 0.34556
## monthDec:hourly_max_t -2.120e-02 6.462e-02 -0.328 0.74294
## monthFeb:hourly_max_t 2.485e-02 4.013e-02 0.619 0.53597
## monthJan:hourly_max_t 5.522e-02 4.490e-02 1.230 0.21915
## monthJul:hourly_max_t 1.735e-03 5.405e-02 0.032 0.97440
## monthJun:hourly_max_t 3.589e-02 5.381e-02 0.667 0.50495
## monthMar:hourly_max_t -2.419e-02 3.829e-02 -0.632 0.52768
## monthMay:hourly_max_t -5.953e-02 4.935e-02 -1.206 0.22805
## monthNov:hourly_max_t -1.057e-01 4.925e-02 -2.146 0.03214 *
## monthOct:hourly_max_t -1.186e-01 6.042e-02 -1.964 0.04990 *
## monthSep:hourly_max_t -1.988e-02 5.387e-02 -0.369 0.71226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.325 on 814 degrees of freedom
## Multiple R-squared: 0.689, Adjusted R-squared: 0.6775
## F-statistic: 60.1 on 30 and 814 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_16)
##
## Breusch-Godfrey test for serial correlation of order up to 34
##
## data: Residuals
## LM test = 77.024, df = 34, p-value = 3.504e-05
plot(lm_hour_16)
Lag 16 Production: Highly significant with a positive impact, indicating that the production 16 hours prior strongly influences the current hour.
Trend Hour 16: Significant with a negative impact, suggesting a slight downward trend over time.
Special Period: Significant with a positive impact, indicating that special periods might slightly increase production.
DSWRF Surface: Highly significant with a positive impact, suggesting increased production with higher downward shortwave radiation.
USWRF Top of Atmosphere: Significant with a positive impact, indicating higher production with increased upward shortwave radiation.
Hourly Cloud Average: Significant with a negative impact, indicating lower production with increased cloud cover.
Is Ramadan: Significant with a positive impact, suggesting slightly higher production during Ramadan.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6828, suggesting that the model explains about 68.28% of the variability in production.
Adjusted R-squared: 0.6714, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The Weighted Mean Absolute Percentage Error (WMAPE) for hour 16 is calculated as 39.61%, considerably well.
Hourly Production Data for Hour 16: The data shows significant fluctuations, indicating variability in production.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
Log transformation for hour 17 was considered. However, we have decided not to use it since it requires elimination of zero values and disrupts continuity of our data.
ggplot(hour_17_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 17",
x = "Date",
y = "Production") +
theme_minimal()
hour_17_data <- all_data[all_data$hour == 17, ]
hour_17_data <- hour_17_data[,-c(2)]
hour_17_data$trend_hour_17 <- 1:nrow(hour_17_data)
hour_17_data$nm <- as.numeric(format(hour_17_data$date, "%m"))
# Assuming your date column is named 'date_column'
hour_17_data$is_not_winter <- as.numeric(!hour_17_data$nm %in% c(12, 1, 2))
hour_17_data <- as.data.table(hour_17_data)
#hour_17_data <- hour_17_data[production > 0]
#hour_17_data[, log_production := log(production)]
hour_17_data[, lag_17_production := shift(production,1)]
hour_17_data[,lag_17_diff:=production-lag_17_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_17_data <- hour_17_data[!is.na(lag_17_production)]
hour_17_data <- hour_17_data[!is.na(lag_17_diff)]
lm_hour_17 <- lm(production ~+lag_17_production+trend_hour_17+special_period+is.ramadan+is.religousday+USWRF_surface+USWRF_top_of_atmosphere+DSWRF_surface+month*hourly_max_t, data = hour_17_data)
summary(lm_hour_17)
##
## Call:
## lm(formula = production ~ +lag_17_production + trend_hour_17 +
## special_period + is.ramadan + is.religousday + USWRF_surface +
## USWRF_top_of_atmosphere + DSWRF_surface + month * hourly_max_t,
## data = hour_17_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2139 -0.2809 -0.0096 0.1888 4.0585
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.306e+00 6.176e+00 -0.373 0.7090
## lag_17_production 5.039e-01 3.057e-02 16.486 < 2e-16 ***
## trend_hour_17 -6.348e-04 1.585e-04 -4.006 6.75e-05 ***
## special_period 2.517e-01 9.986e-02 2.521 0.0119 *
## is.ramadan 2.729e-01 1.400e-01 1.950 0.0516 .
## is.religousday -1.436e-01 1.685e-01 -0.852 0.3943
## USWRF_surface -4.230e-03 2.670e-03 -1.584 0.1135
## USWRF_top_of_atmosphere 1.182e-03 1.181e-03 1.001 0.3170
## DSWRF_surface 3.048e-03 1.289e-03 2.364 0.0183 *
## monthAug 2.927e-01 1.256e+01 0.023 0.9814
## monthDec 3.770e-02 1.107e+01 0.003 0.9973
## monthFeb 6.113e+00 8.106e+00 0.754 0.4510
## monthJan -6.389e-01 7.768e+00 -0.082 0.9345
## monthJul -8.086e+00 1.075e+01 -0.752 0.4521
## monthJun 1.262e+01 1.001e+01 1.261 0.2077
## monthMar 8.033e+00 7.692e+00 1.044 0.2966
## monthMay 1.500e+01 9.482e+00 1.582 0.1140
## monthNov 6.433e+00 9.207e+00 0.699 0.4849
## monthOct 2.789e+00 1.125e+01 0.248 0.8043
## monthSep 9.095e+00 1.017e+01 0.895 0.3713
## hourly_max_t 6.029e-03 2.146e-02 0.281 0.7788
## monthAug:hourly_max_t -7.223e-05 4.135e-02 -0.002 0.9986
## monthDec:hourly_max_t 1.846e-03 3.946e-02 0.047 0.9627
## monthFeb:hourly_max_t -2.127e-02 2.834e-02 -0.751 0.4531
## monthJan:hourly_max_t 3.744e-03 2.748e-02 0.136 0.8917
## monthJul:hourly_max_t 2.613e-02 3.588e-02 0.728 0.4667
## monthJun:hourly_max_t -4.251e-02 3.399e-02 -1.250 0.2115
## monthMar:hourly_max_t -2.874e-02 2.677e-02 -1.074 0.2832
## monthMay:hourly_max_t -5.038e-02 3.256e-02 -1.547 0.1222
## monthNov:hourly_max_t -2.148e-02 3.213e-02 -0.669 0.5039
## monthOct:hourly_max_t -9.678e-03 3.861e-02 -0.251 0.8022
## monthSep:hourly_max_t -3.093e-02 3.428e-02 -0.902 0.3672
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7684 on 813 degrees of freedom
## Multiple R-squared: 0.679, Adjusted R-squared: 0.6668
## F-statistic: 55.48 on 31 and 813 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_17)
##
## Breusch-Godfrey test for serial correlation of order up to 35
##
## data: Residuals
## LM test = 144.34, df = 35, p-value = 3.151e-15
plot(lm_hour_17)
Lag 17 Production: Highly significant with a positive impact, indicating that the production 17 hours prior strongly influences the current hour.
Trend Hour 17: Significant with a negative impact, suggesting a slight downward trend over time.
Special Period: Significant with a positive impact, indicating that special periods might slightly increase production.
Is Ramadan: Significant with a positive impact, suggesting slightly higher production during Ramadan.
USWRF Surface: Marginally significant with a negative impact, indicating a slight decrease in production with increased upward shortwave radiation at the surface.
DSWRF Surface: Significant with a positive impact, suggesting increased production with higher downward shortwave radiation.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.6759, suggesting that the model explains about 67.59% of the variability in production.
Adjusted R-squared: 0.6638, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The Weighted Mean Absolute Percentage Error (WMAPE) for hour 17 is calculated as 61.69%, considerably high error.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.
ggplot(hour_18_data, aes(x = date, y = production)) +
geom_line() +
labs(title = "Hourly Production Data for Hour 18",
x = "Date",
y = "Production") +
theme_minimal()
hour_18_data <- all_data[all_data$hour == 18, ]
hour_18_data <- hour_18_data[,-c(2)]
# Assuming your dataframe is named 'data' and the production column is named 'production'
#data_18_filtered <- hour_18_data[hour_18_data$production != 0, ]
hour_18_data$trend_hour_18 <- 1:nrow(hour_18_data)
hour_18_data[, lag_18_production := shift(production,1)]
hour_18_data[,lag_18_diff:=production-lag_18_production]
# Remove rows with NA in lagged production to ensure the model can run
hour_18_data <- hour_18_data[!is.na(lag_18_production)]
lm_hour_18 <- lm(production ~+lag_18_production +trend_hour_18+special_period+TMP_surface+DSWRF_surface+month*hourly_max_t , data = hour_18_data)
summary(lm_hour_18)
##
## Call:
## lm(formula = production ~ +lag_18_production + trend_hour_18 +
## special_period + TMP_surface + DSWRF_surface + month * hourly_max_t,
## data = hour_18_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7237 -0.0471 -0.0031 0.0060 3.7605
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.021e-01 2.286e+00 -0.045 0.96438
## lag_18_production 1.040e-01 3.482e-02 2.987 0.00290 **
## trend_hour_18 2.088e-05 5.039e-05 0.414 0.67866
## special_period 9.697e-02 3.446e-02 2.814 0.00500 **
## TMP_surface 3.467e-04 8.013e-03 0.043 0.96550
## DSWRF_surface -6.681e-06 1.368e-04 -0.049 0.96107
## monthAug -1.342e+01 4.901e+00 -2.738 0.00632 **
## monthDec 1.598e-01 3.799e+00 0.042 0.96647
## monthFeb 2.223e-01 2.628e+00 0.085 0.93261
## monthJan 2.719e-01 2.754e+00 0.099 0.92137
## monthJul -6.414e+00 4.419e+00 -1.452 0.14698
## monthJun 2.336e+00 4.055e+00 0.576 0.56485
## monthMar 2.718e-01 2.658e+00 0.102 0.91857
## monthMay -1.572e+00 3.689e+00 -0.426 0.67009
## monthNov 4.387e-02 3.569e+00 0.012 0.99020
## monthOct -6.391e-01 4.132e+00 -0.155 0.87710
## monthSep 2.067e-01 3.740e+00 0.055 0.95594
## hourly_max_t NA NA NA NA
## monthAug:hourly_max_t 4.489e-02 1.637e-02 2.742 0.00624 **
## monthDec:hourly_max_t -5.943e-04 1.362e-02 -0.044 0.96520
## monthFeb:hourly_max_t -8.148e-04 9.300e-03 -0.088 0.93020
## monthJan:hourly_max_t -9.999e-04 9.801e-03 -0.102 0.91876
## monthJul:hourly_max_t 2.228e-02 1.491e-02 1.494 0.13545
## monthJun:hourly_max_t -7.369e-03 1.389e-02 -0.530 0.59595
## monthMar:hourly_max_t -9.833e-04 9.352e-03 -0.105 0.91629
## monthMay:hourly_max_t 5.688e-03 1.279e-02 0.445 0.65657
## monthNov:hourly_max_t -2.662e-04 1.263e-02 -0.021 0.98319
## monthOct:hourly_max_t 2.042e-03 1.442e-02 0.142 0.88747
## monthSep:hourly_max_t -8.880e-04 1.282e-02 -0.069 0.94478
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2704 on 817 degrees of freedom
## Multiple R-squared: 0.1987, Adjusted R-squared: 0.1723
## F-statistic: 7.506 on 27 and 817 DF, p-value: < 2.2e-16
checkresiduals(lm_hour_18)
##
## Breusch-Godfrey test for serial correlation of order up to 32
##
## data: Residuals
## LM test = 187.64, df = 32, p-value < 2.2e-16
plot(lm_hour_18)
Lag 18 Production: Significant with a positive impact, indicating that the production 18 hours prior influences the current hour.
Trend Hour 18: Not significant, suggesting the trend does not have a noticeable effect.
Special Period: Significant with a positive impact, indicating that special periods might slightly increase production.
Tmp Surface: Significant with a positive impact, suggesting that higher temperatures increase production.
DSWRF Surface: Not significant, indicating it does not significantly affect production.
Monthly Effects: Some months showed significant interactions with hourly_max_t, indicating seasonal variations in production.
Residual Standard Error: Indicates the variability in the residuals or prediction errors.
Multiple R-squared: 0.1977, suggesting that the model explains about 19.77% of the variability in production.
Adjusted R-squared: 0.1719, slightly lower than Multiple R-squared, accounting for the number of predictors in the model.
F-statistic: Significant, indicating that the model provides a better fit than a model with no predictors.
The Weighted Mean Absolute Percentage Error (WMAPE) for hour 18 is calculated as 87.70%, very high error.
Residuals Analysis:
Top Plot (Residuals over time): Shows periods of higher residuals, indicating times when the model predictions were less accurate.
ACF Plot (Autocorrelation of Residuals): Indicates some autocorrelation, suggesting that not all patterns in the data are fully captured by the model.
Histogram (Distribution of Residuals): Centered around zero but with some deviations, indicating areas where predictions might be off.